We have decide to build on the flight data from OpenFlights.org that we used in our first project.
OpenFlights consists of the following datasets:
We have choosen to use the Routes and Airports datasets to build our network. We will use Airline ID and Source Airport ID from the Routes dataset as our two nodes. Each route record will be an edge between the two ntypes of nodes and we will add weighting by the number of routes that each airine has leaving from each particular airport. We will get additional airport node attribute data from the Airports dataset.
Variables used from the Routes dataset:
Variables used from the Airports dataset:
Data is saved as .DAT files. They are UTF-8 encoded.
First import necessary packages for plotting graphs using NetworkX and Matplotlib and set up graph size parameters...
import networkx as nx
from networkx.algorithms import bipartite as bi
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
%matplotlib inline
Read in Routes data to a Pandas dataframe. We only need the Airline and Airport columns so we selected for just those two.
routes_raw = pd.read_csv("Project1_Data/routes.dat",
header=None, sep=",",
usecols=[0, 2],
names=['Airline', 'Airport'])
routes_raw=routes_raw.dropna()
routes_raw.head(15)
We chose to count the number of routes for each eirline at each airport and use that as a weight for our airline to airport edges.
aa = routes_raw.copy()
aa['weight'] = aa.groupby('Airline')['Airport'].transform('count')
aa.head(10)
len(aa)
airports_raw = pd.read_csv("Project1_Data/airports.dat",
header=None, sep=",",
usecols=[1, 2, 3, 4, 6, 7],
names=['Airport Name','City','Country','IATA', 'Latitude', 'Longitude'])
#airports_raw1=airports_raw[airports_raw.Country=='France']
airports_raw.head()
len(airports_raw)
aa_merged = airports_raw.copy()
aa_merged = pd.merge(aa,aa_merged,left_on='Airport',right_on='IATA',how='inner')
aa_merged.head()
airport_nodes = aa_merged[['Airport','Airport Name','City','Country','IATA','Latitude','Longitude']].drop_duplicates().sort_values(by='Airport')
set(aa_merged['Airline']) & set(aa_merged['Airport'])
There are two three letter codes that are used for both airlines and airports, so we have to add differentiation to the names before adding these airport atributes to the graph or all our airlines will also get coded as airports... To keep things short and so we can still read out graph once it's plotted, we added just an L for airLines just to those two airline codes.
#aa_merged['Airline'] = 'L_' + aa_merged.Airline.map(str)
#aa_merged['Airport'] = 'P_' + aa_merged.Airport.map(str)
#airport_nodes['Airport'] = 'P_' + airport_nodes.Airport.map(str)
aa_merged['Airline'] = np.where(
aa_merged['Airline'].isin(['ABJ', 'IOS']),
'L_' + aa_merged.Airline.map(str),
aa_merged.Airline
)
set(aa_merged['Airline']) & set(aa_merged['Airport'])
aa_merged.head()
First we will create a full bipartite network with both nodes.
B = nx.Graph()
B.add_nodes_from(aa_merged['Airline'], bipartite=0)
B.add_nodes_from(aa_merged['Airport'], bipartite=1)
B.add_weighted_edges_from([tuple(d) for d in aa_merged[['Airline','Airport','weight']].values])
nx.set_node_attributes(B, airport_nodes.set_index('Airport').to_dict('index'))
Let's double check if our attributes were saved in the graph data.
list(B.nodes(data=True))[:3]
list(B.nodes(data=True))[-2:]
B.get_edge_data('L_AER','P_KZN')
print(nx.info(B))
airline_nodes = {n for n, d in B.nodes(data=True) if d['bipartite']==0}
airport_nodes = set(B) - airline_nodes
nx.is_connected(B)
bi.is_bipartite(B)
print(bi.density(B, airline_nodes))
print(bi.density(B, airport_nodes))
The functions below will use the edge weights to trim the graph into subgraphs at different minimum weight levels. Any node not connected by at least one ede with the minimum weight will be cut from the graph. We used the textbook functions as a starter, but had to revise them to get them to work...
'''
# textbook function doesn't work...
def trim_edges(g, weight=1):
g2=nx.Graph()
for f, to, edata in g.edges(data=True):
if edata['weight'] > weight:
g2.add_edge(f, to, edata)
return g2
'''
def trim_edges(g, weight=1):
g2=nx.Graph()
my_list=[]
my_list1=[]
for f, to, edata in g.edges(data=True):
if edata['weight'] > weight:
my_list.append(f)
my_list1.append(to)
g2.add_edge(f,to,attr_dict={weight:edata['weight']})
g2.add_nodes_from(my_list, bipartite=0)
g2.add_nodes_from(my_list1, bipartite=1)
return g2
def island_method(g, iterations=5, weight=1):
weights = [edata['weight'] for f, to, edata in g.edges(data=True)]
mn=int(min(weights)) if int(min(weights)) > weight else weight
mx=int(max(weights))
#compute the size of the step, so we get a reasonable step in iterations
step=int((mx-mn)/(iterations-1))
return [[threshold, trim_edges(g, threshold)] for threshold in range(mn,mx,step)]
islands = island_method(B, 4, 350)
print('min weight - ', '# of nodes - ', '# of island subgraphs')
for i in islands:
# print the threshold level, size of the graph, and number of connected components
print(i[0], ' - ', len(i[1]), ' - ', len(list(nx.connected_component_subgraphs(i[1]))))
islands[1][0] # the first index refers to the threshold
islands[1][1] # the second index refers to the graph object
def set_colors(G):
colors = []
for node, data in G.nodes(data=True):
if data['bipartite'] == 1:
colors.append('olivedrab') # Airports in Green
else:
colors.append('cornflowerblue') # Airlines in Blue
return colors
G0=max(nx.connected_component_subgraphs(islands[0][1]), key=len)
plt.rcParams["figure.figsize"] = (15,15) # set plot size
colors = set_colors(G0) # set colors
#weights = [math.log(edata['attr_dict'][200]) for f, t, edata in G0.edges(data=True)] # set weights
nx.draw(G0, with_labels=True, node_color=colors, node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
We can start to see already how many airports are connected by a much smaller number of airlines (as we would expect). We can also see how airlnes serve as boundary spanners between airports.Let's raise the water level though to see if we can gain more insight...
G1=islands[1][1]
plt.rcParams["figure.figsize"] = (15,15) # set plot size
colors = set_colors(G1) # set colors
nx.draw(G1, with_labels=True, node_color=colors, node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
Major airlines are starting to be visible with the airports that they connect. Many cliques are startig to develop and airlines are starting to show as boundary spanners. A higher water level may make it easier to pick out individual airlines and airports.
G2=islands[2][1]
plt.rcParams["figure.figsize"] = (15,15) # set plot size
colors = set_colors(G2) # set colors
nx.draw(G2, with_labels=True, node_color=colors, node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
Here we can really start to see the 4 most influential airlines in the network:
If there are any others they are still hidden in the graph. Air France clearly stands out as a boundary spanner in this graph.
G3=islands[3][1]
plt.rcParams["figure.figsize"] = (8,8) # set plot size
colors = set_colors(G3) # set colors
nx.draw(G3, with_labels=True, node_color=colors, node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
In this last graph we can see that FR, Air France, is the most influential airline in the network.
# compute an affiliation network of the Airports
airports = bi.weighted_projected_graph(B, airport_nodes)
# Find the largest connected subgraph in the network
#users_subgraph = nx.connected_component_subgraphs(users)[0] # textbook code doesn't work
airports_subgraph = max(nx.connected_component_subgraphs(airports), key=len) # alternate method
airports_subgraph.name = "Airports"
print(nx.info(airports_subgraph))
airport_islands = island_method(airports_subgraph, 3, 25)
print('min weight - ', '# of nodes - ', '# of island subgraphs')
for i in airport_islands:
# print the threshold level, size of the graph, and number of connected components
print(i[0], ' - ', len(i[1]), ' - ', len(list(nx.connected_component_subgraphs(i[1]))))
APG0=airport_islands[0][1]
# set plot size
plt.rcParams["figure.figsize"] = (15,15)
nx.draw(APG0, with_labels=True, node_color='olivedrab', node_size=200,
font_size=10, font_weight='bold', edge_color="olivedrab", alpha=0.5)
APG1=airport_islands[1][1]
# set plot size
plt.rcParams["figure.figsize"] = (6,6)
nx.draw(APG1, with_labels=True, node_color='olivedrab', node_size=200,
font_size=10, font_weight='bold', edge_color="olivedrab", alpha=0.5)
LHR or Heathrow Airport in London, England stands out as a boundary spanner in this graph.
APG2=max(nx.connected_component_subgraphs(airport_islands[2][1]), key=len)
# set plot size
plt.rcParams["figure.figsize"] = (2,2)
nx.draw(APG2, with_labels=True, node_color='olivedrab', node_size=200,
font_size=10, font_weight='bold', edge_color="olivedrab", alpha=0.5)
Frankfurt am Main Airport in Frankfurt, Germany and Charles de Gaulle Airport (Roissy Airport) in Paris, France are the two most influential airports in our network.
# compute an affiliation network of the Airlines
airlines = bi.weighted_projected_graph(B, airline_nodes)
# Find the largest connected subgraph in the network
#users_subgraph = nx.connected_component_subgraphs(users)[0] # textbook code doesn't work
airlines_subgraph = max(nx.connected_component_subgraphs(airlines), key=len) # alternate method
airlines_subgraph.name = "Airlines"
print(nx.info(airlines_subgraph))
airline_islands=island_method(airlines_subgraph, 3, 50)
print('min weight - ', '# of nodes - ', '# of island subgraphs')
for i in airline_islands:
# print the threshold level, size of the graph, and number of connected components
print(i[0], ' - ', len(i[1]), ' - ', len(list(nx.connected_component_subgraphs(i[1]))))
ALG0=max(nx.connected_component_subgraphs(airline_islands[0][1]), key=len)
# set plot size
plt.rcParams["figure.figsize"] = (15,15)
nx.draw(ALG0, with_labels=True, node_color='cornflowerblue', node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
ALG1=max(nx.connected_component_subgraphs(airline_islands[1][1]), key=len)
# set plot size
plt.rcParams["figure.figsize"] = (6,6)
nx.draw(ALG1, with_labels=True, node_color='cornflowerblue', node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
The seven most influential airlines in the network in alphabetical order are:
ALG2=max(nx.connected_component_subgraphs(airline_islands[2][1]), key=len)
# set plot size
plt.rcParams["figure.figsize"] = (2,2)
nx.draw(ALG2, with_labels=True, node_color='cornflowerblue', node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
The two most influential airlines in the network are clearly US and AA corresponding to US Airways and American Airlines respectively. Interestingly these do not correspond to the most influential airline in the bipartite network graph above, which was FR for Air France.